UNED at RepLab 2012: Monitoring Task
نویسندگان
چکیده
This paper describes the UNED participation at RepLab 2012 Monitoring Task. Given an entity and a tweet stream containing the entity’s name, the task consists on grouping the tweets in topics and then ranking the identified topics by priority. We tested three different systems to deal with the clustering problem: (i) an agglomerative clustering based on term co-occurrences, (ii) a clustering method that considers ‘wikified” tweets, where each tweet is represented with a set of Wikipedia entries that are semantically related to it and (iii) Twitter-LDA, a topic modeling approach that extends LDA considering some of the intrinsic properties of Twitter data. For the ranking problem, we rely on the insight that the priority of a topic depends on the sentiment expressed in the subjective tweets that refer to it. Although none of the proposed systems outperforms the official baseline in average, our systems obtain reasonable high precision results, (i.e. high Reliability scores). The average sentiment of a topic seems to be an useful indicator of priority, that merits further study. Finally, topics with high ratio of unrelated tweets are difficult to group correctly, suggesting a need of an explicit treatment of ambiguity.
منابع مشابه
UNED at CLEf RepLab: Author Profiling
This paper describes a learning system developed for the RepLab 2014 author profiling task at UNED. The system uses a voting model, which employs a small set of features based mainly on the tweet text information such as POS tags, number of hashtags or number of links. In the unofficial run, the feature set was increased with Twitter metadata such as number of followers or retweet speed. The sy...
متن کاملUNED-READERS: Filtering Relevant Tweets using Probabilistic Signature Models
This paper describes the (usupervised) knowledge-based approach to filter relevant tweets for a given entity that is followed by the UNED-READERS system at RepLab 2013. The approach relies on a new way of contextualizing entity names from relative large and broad collections of texts using probabilistic signature models (i.e., discrete probability distributions of words lexically related to the...
متن کاملUNED Online Reputation Monitoring Team at RepLab 2013
This paper describes the UNED’s Online Reputation Monitoring Team participation at RepLab 2013 [3]. Several approaches were tested: first, an instance-based learning approach that uses Heterogeneity Based Ranking to combine seven different similarity measures was applied for all the subtasks. The filtering subtask was also tackled by automatically discovering filter keywords: those whose presen...
متن کاملOverview of RepLab 2012: Evaluating Online Reputation Management Systems
This paper summarizes the goals, organization and results of the first RepLab competitive evaluation campaign for Online Reputation Management Systems (RepLab 2012). RepLab focused on the reputation of companies, and asked participant systems to annotate different types of information on tweets containing the names of several companies. Two tasks were proposed: a profiling task, where tweets ha...
متن کاملLexical and Machine Learning Approaches Toward Online Reputation Management
With the popularity of social media, people are more and more interested in mining opinions from it. Learning from social media not only has value for research, but also good for business use. RepLab 2012 had Profiling task and Monitoring task to understand the company related tweets. Profiling task aims to determine the Ambiguity and Polarity for tweets. In order to determine this Ambiguity an...
متن کامل